scikit-learn: permutation_test_score
Evaluate the significance of a cross-validated score with permutations.
Permutes targets to generate ‘randomized data’ and compute the empirical p-value against the null hypothesis that features and targets are independent.
The true score without permuting targets.
The best possible p-value is 1/(n_permutations + 1), the worst is 1.0.
This function implements Test 1
Has the classifier found a significant class structure, that is, a real connection between the data and the class labels? (論文の3より)
It provides a permutation-based p-value, which represents how likely an observed performance of the classifier would be obtained by chance.
The null hypothesis in this test is that the classifier fails to leverage any statistical dependency between the features and the labels to make correct predictions on left out data.
For reliable results n_permutations should typically be larger than 100 and cv between 3-10 folds.
A low p-value provides evidence that the dataset contains real dependency between features and labels and the classifier was able to utilize this to obtain good results.
A high p-value could be due to a lack of dependency between features and labels (there is no difference in feature values between the classes) or because the classifier was not able to use the dependency in the data.
In the latter case, using a more appropriate classifier that is able to utilize the structure in the data, would result in a lower p-value.
Cross-validation provides information about how well a classifier generalizes, specifically the range of expected errors of the classifier.
However, a classifier trained on a high dimensional dataset with no structure may still perform better than expected on cross-validation, just by chance.
This can typically happen with small datasets with less than a few hundred samples.
permutation_test_score provides information on whether the classifier has found a real class structure and can help in evaluating the performance of the classifier.
It is important to note that this test has been shown to produce low p-values even if there is only weak structure in the data because in the corresponding permutated datasets there is absolutely no structure.
This test is therefore only able to show when the model reliably outperforms random guessing.
It is therefore only tractable with small datasets for which fitting an individual model is very fast.